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Abstract 

We propose a new topological invariant of unlabeled trees of N nodes. The invariant 
is a set of N x 2 matrices of integers, with Y2j k di - j and V{ as the matrix elements, 
where dij are the elements of the distance matrix and V{ denotes i-th node's degree 
and k 6 N. To compare the invariant calculated for possibly different graphs, the 
matrix rows are ordered with respect to first column, and — if necessary — with 
respect to the second one. We use the new invariant to evaluate from below the 
number of topologically different unlabeled trees up to N = 17. The results slightly 
exceed the asymptotic evaluation of Otter. 
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1 Introduction 

Averaging over different graphs is basic in numerous applications of the graph 
theory [1,2]. For such tasks, knowledge of the number of topologically differ- 
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ent graphs is of primary importance. Having two graphs, a typical question 
is: are they different? If the graphs are labeled, respective algorithms are of 
polynomial time. However, for unlabeled graphs the task should be to check 
all possible labellings, what makes the problem unfeasible [3]. An alternative 
solution is to find a quantity which is different for different graphs, and of 
the same value if the graphs are topologically equivalent. The latter means 
that there is a one-to-one transformation from one graph to another: each 
pair of nodes linked (not linked) in one graph is linked (not linked) in another 
graph. Such a quantity is a topological invariant. However, actually we can be 
never sure if the quantity proposed as the invariant has indeed the above dis- 
criminating property. While its different values certainly mean different graph 
structures, the same value does not allow to claim that the graphs are indeed 
topologically identical. In many cases, the proposed quantity appears to be 
degenerate, i.e. its value is the same for different graphs. All that remains true 
for unlabeled trees, which are graphs without cyclic paths and without loops. 

In a series of papers, Schultz et al. proposed and evaluated some scalar quan- 
tities as candidates to be topological invariants for trees [4]. This work was 
motivated by a chemical application of the constructed quantities, which were 
found to increase monotonically with the melting temperature of alkanes. How- 
ever, almost all proposed invariants were found to be degenerate. On the other 
hand, the last proposed invariant is a real number and not integer, and the 
comparison of its value must rely on the numerical accuracy. 

Here we propose a new candidate as a topological invariant for unlabeled trees. 
Unlike the quantities discussed previously, this is a set of matrices and not a 
single number. The advantage is that the matrices are ordered in a simple way, 
and the ordering algorithm works in polynomial time. On the other hand, to 
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Table 1 

The number of trees T evaluated basing on sorted (b, v) pairs with k < 6. To is 
given by the Otter's formula (1). 



N 1 2 3 4 5 

T 1112 3 

T 1.6 0.8 0.9 1.3 2.2 

6 7 8 9 10 

6 11 23 47 106 

4.0 8.1 17.2 37.9 86.1 



15 
7741 
7049.1 
16 17 
19320 48629 
17731.0 45038.0 



11 12 
235 551 
200.5 476.9 



13 14 
1301 3159 
1153.9 2833.8 



state that two trees are topologically identical we compare all the matrix 
elements. This modification is expected to enhance the discriminative force of 
the proposed invariant. We use the obtained criterion to calculate the number 
of topologically non-equivalent trees up to iV = 17 nodes. As stated above, 
the obtained numbers can be treated only as an evaluation of the true results 
from below. Then, if one has a better criterion, he should find the greater 
number of trees for N < 17, than our result, given in Table 1. 
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In our next section, our numerical procedure is described in details. Section 3 
contains the numerical results. The obtained numbers of trees are compared 
to the analytical evaluation of Otter [7]. In Section 4 we provide an argument 
that the range of values of any good candidate of a topological invariant should 
increase exponentially with the number of nodes N. Our proposition is the only 
one we know to fulfill this criterion. However, this 'criterion of range' is not 
sufficient in the sense that it does not exclude the possible degeneracy. 

2 Numerical approach 

Our numerical approach is based on the construction of the distance matrix 
T) N during tree growth [5] . In distance matrix D element dij gives the length 
of the shortest path between nodes i and j, i.e. the minimal number of edges 
which connect these vertices. The construction algorithm relies on the fact that 
a distance to a newly added (N + l)-th node to all other nodes 1 < i < N 
via node q — to which new node is attached — is djv+i,i — dq,i + 1- The 
computational complexity of the distance matrix D construction recipe is of 
order of 0(N 2 ). The number of '1' in z-th row gives i-th node's degree i>j. 

For counting trees two single-column vectors seem to be useful: the first one 
b gives sum of the natural parameter k G N to the power equal to distance 
dij of i-th node to another node j: bi = YljLi k dl - j . The second vector v serves 
node's degrees Vi. These vectors form a matrix, which is sorted with key pair 
(b,v): two trees are different if their (b, v) are different for all values of k. 
Actually, we compare the matrices for k = 2,3,4,5 and 6. We have checked 
numerically, that the results of trees counting are different for k < 4 and k — 5 
but they are the same for k = 5 and k — 6. Sorting elements of (b, v) makes 
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(a) 



@ (b) 



Pa ■-!■ 



Fig. 1. Two trees of N = 4 nodes. (Figures using Pajek [12].) 

the matrix independent on an order of labeling of the tree's nodes. 

For example, the only two existing trees for N = A — presented in Fig. 1 
have distance matrices D 4 [6]: 



/ \ 
12 3 

10 12 

2 10 1 



^3 2 10^, 



and D 4 



/ \ 
12 2 

10 11 

2 10 2 



^2 12 0^, 



and sorted pair (b, v) 4 for k = 2: 



(b,v) 4 



15 1 



15 1 



9 2 



v 92 / 



and (b, v) 



/ \ 
11 1 

11 1 

11 1 



v 73 / 



Now, the next generation of trees is produced N — > N + 1 by systematically 
adding a new node to each node for all preexisting trees. For example in case 
oiN = 4^N = 5 look at Fig. 2. Among eight cases only three classes of 
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(a) 



(b) 



(c) 



Fig. 2. Eight possible trees obtained by adding one node by one link in all possible 
ways to two trees with N = 4. Among them only three are different. 



(b, v) 5 exist, i.e.: 



(b,v) 



/ \ 
31 1 



31 1 



17 2 



17 2 



V 132 / 



(b,v) 



( \ 
23 1 



19 1 



19 1 



13 2 



and (b, v) 



/ \ 
15 1 



15 1 



15 1 



15 1 



v 94 / 



for k = 2. Three distance matrices b c for these three trees are necessary to 
next step, i.e. N = 5 — > iV = 6. The procedure is repeated recursively. 



Technically, the sorting with key procedure is an implementation of the quick- 
sort algorithm [8] while comparing two (b, v) matrices are realized with stan- 
dard C++ STL library [9]. 
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3 Results of simulations 

The number of trees T obtained with above algorithm with k < 5 are given 
in Table 1. The results agree with the available number of trees given in Refs. 
[2,10]. For example, all T = 47 trees of iV = 9 nodes are presented in Fig. 3. 

For large enough N the number of trees T is asymptotically given as 

T (N) =(3-a N -iV- 5/2 , (1) 

where a = 2.9557652856 • • • and /3 = 0.5349496061 • • • [7]. The comparison of 
the results of the exact trees counting and predictions of Eq. (1) is shown in 
Table 1 and in Fig. 4. 

In the terminology of Ref. [4] the degree vector v is called valence vector. The 
molecular topological index (MTI) is defined as 1 

MTI= ||v(A + D)||, (2) 

where A is a graph's adjacency matrix and vector norm || • • • || is defined as 
sum of absolute value of vectors element 

N 

||c|| = ||(ci, c 2 , • • • ,0^-1,0^)11 = ^N- 

1=1 

In adjacency matrix A element a^- gives number of edges between nodes % 
and j. For simple graphs — where multiple edges are forbidden — matrix A 
becomes binary. 

1 MTI was originally defined as a simple sum of elements of product v(A + D) and 
not as sum of absolute values of its elements. As elements of v(A + D) are always 
positive our description is only more formally compact. 
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Fig. 3. All T = 47 trees with N = 9 nodes. 



8 



1e+06 




2 4 6 



Fig. 4. Number of trees T as compared with the Otter's formula To- 
MTI was believed to be single-number value which allow to differ between trees 
[4]. Here, however we can see that this method of counting fails for N > 8. 
The obtained number of trees with Schultz method is Ts(8) = 20 while true 
value is T(8) = 23. Three pairs of trees which have the same MTI but different 
(b,v) 8 are shown in Fig. 5. 

The purpose of introducing MTI was to differentiate chemical molecules. 
When a carbon atom (with proper number of hydrogen atoms) is assigned 
to all nodes of trees shown in Fig. 5(c)-(f) they may represent semi-structural 
formulas of (c) 2,2,4-trimethylpentane, (d) 3-ethyl-2-methylpentane, (e) 2,2- 
dimethylhexane and (f) 3-ethylhexane [11]. The MTI cannot differ between 
pairs (c,d) and (e,f) of these forms of C 8 H 18 . 



Our results contain not only the number of trees, but the structure of all of 
them. Binary files with distance matrices and the program for their conversion 
to input files for Pajek [12] program are available from our web page [13]. 
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MTI 



230 (a) 



B (b) 




Y 



242 (c) 



(d) 



260 (e) 



(f) 



Fig. 5. Three pairs of different tope-logically trees with N = 8 nodes with the same 
MTI. 

4 Discussion 

Now we are going to prove that for large N, the range of any discriminative 
topological invariant with integer values should increase exponentially with N. 
To each tree, a different value of the invariant must be assigned, if the invariant 
is discriminative. Then we get an exponentially increasing number of different 
integer values. The length of a range on an axis, where these values can be 
placed, must increase also at least exponentially, what finishes the proof. We 
note that the matrix character of the invariant does not change this result, 
as long as the matrix size increases as N c , where c is a constant. In our case 
c = 1, because the matrix is iV x 2. We should add that this 'range criterion' 
is crucial in the asymptotic regime of large N. Up to now, the computational 
resources do not allow to penetrate this region. 

Concluding, we have proposed a new topological invariant to discriminate 
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unlabeled trees. The matrix character of the invariant allows to believe, that 
the discriminating power of the invariant is much better, than scalar invariants 
proposed previously. 
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